Artificial Perception of Actions

نویسنده

Robert H. Thibadeau

چکیده

This paper has to do with the visual perception of actions that are discretely conceptualized. The intent is to develop a vision system that produces causal or intentional descriptions of actions, thus providing the conceptual underpinnings of natural langu age descriptions. The computational theory is developed in linking a "point of action definition" analysis to an analysis of how the physical events will elicit appropriate verbal descriptions. Out of this theory of direct computational linkages between physical events, points of action definition, and verbal descriptions, comes a theory of perception that provides some insight into how to go about constructing systems that can watch the world and report on what they are watching. Artificial Perception of Actions 2 The aim of this research is to develop a machine that interprets visually observed events as discrete actions. Every discrete action is a momentary causal or intentional description of movement. To insure the system’s veracity and usefulness, the description and thereby the "perception of action" must provide a faithful mapping to natural language. For example, the machine might verbally report that some person is now walking toward the robot, that a person has left his station hurriedly, that a machine operator is pushing the wrong switch, or that a person has been walking down the corridor trying to open locked doors, or that someone is running down the street. In effect, the machine must see everyday actions in a way that permits their ready communication to us. There is hardly any active research on the problem of computing such perceptions. On the contrary, it is particularly easy to find papers that assert or strongly imply that the study of action perception must wait until the results are in on object perception and motion perception. This view is even expressed by researchers who will admit that motion perception can be studied without solving the problems of object perception. It is my thesis that actions are perceived directly without necessarily interpreting objects or motions. I would like to understand how to account computationally for action perception without also accounting for all its ostensible precursors, like object and motion perception. Just as, in a computational sense, a motion can be perceived without first recognizing the moving object, it is possible an action can be perceived without first recognizing the underlying motions. Such a relationship should not be taken to deny indirect computation from object and motion information, only from the conceptualization of objects and motions. The present strategy approaches action perception intensely "bottom up" and "top down", seeking always to discover the most direct relationships that hold between the physical events and the verbal conceptions. To begin the analysis, we need to distinguish what is conventionally meant by the study of "action perception" by comparison to the studies of related perceptual phenomena such as "motion perception", "the perception of causality," and others. Action Perception as a Study Michotte’s studies of the perception of causality (Michotte, 1963) represent some of the best known and earliest work of relevance to action perception. Michotte devoted his life to classifying physical conditions that elicit perceived causation. He discovered that perceived causality is based upon a few distinct sensations. There are only a few ways to perceive causation. By contrast, action perception is, by convention, not about distinct sensations of action, but about perceiving individual actions. The perception of causality is important in the perception of many actions, but it may not be involved in the perception of all actions. Motion perception is often believed to encompass action perception, but the literature on motion perception deals with the perception of physical motions which are not inherently discrete. The physical motions are algebraically characterized by a cyclic or, at best, continuous, function with no naturalf beginning or ending (see Johannson, 1975; Restle, 1979; Ullman, 1979 and below). A classic problem, described in Ullman (1979) , is to understand the computation involved in merging spatially separated Artificial Perception of Actions 3 motions together into the perception of a single coordinated motion (e.g., seeing an arm movement as a coordinated movement of one object). Motion perception and object perception then become closely related topics. By contrast, action perception is further removed from object perception, per se, and is involved with discrete motions of objects. In action perception, the emphasis is as much in the conditions for the beginning and ending of motions as it is with the motions themselves. One area of research associated with action perception is natural language understanding. A typical problem in language understanding is the understanding of natural language descriptions of actions at a "conceptual" level. There is relevant work in basic action categories (primitives, see Schank, 1975) and (Miller and Johnson-Laird, 1976) in which the aim is to compose complex actions out of simpler, more primitive, ones. There is also relevant natural language work on recognizing the significance of action sequences (Schank and Abelson, 1977; Schmidt, Sridharan and Goodson, 1978). The work in recognizing the significance of action sequences is of importance owing to the "hierarchical" nature of action. An action, although perceived as a single thing, is typically made up of finer actions concatenated together (e.g., body movements are a component of walking). The finer-actions can be individually perceived if the perceiver has the opportunity, and wishes, to see them. The work from natural language understanding claims to describe a "language-free" conceptual level of representation which, by the claim, should bear on the understanding of action perception. Unfortunately, it is not that clear that "language-free" levels of representation have yet been described. Present knowledge about conceptual representation de facto derives from research grounded in the uses of natural language. It is true that the peripheral taxonomic structures of language, such as syntax and morphology, are distinguished from conceptual, or ideational, structures, but, from the point of view of the present research, what a linguist, or even a logician, may call ideational structures look, because of the derivation, like language structures as well. The same may be said for "conceptual representations" or "propositional representations" in AI. A major goal of the present research is to provide some independently motivated mapping or derivation between visual perception and our understanding of conceptual representation. The study of "action perception" has been most developed by social psychologists, most prominently Heider (Heider,1958; Heider and Simmel, 1944) although I focus on Newtson (Massad, Hubbard and Newtson, 1979; Newtson, 1973; Newtson, 1976; Newtson and Engquist, 1976; Newtson, Engquist and Bois, 1977; Newtson and Rindner, 1979; Newtson, Rindner, Miller and LaCross, 1978; Wilder, 1978, 1978) . The interest in action perception stems from a struggle to understand how people attribute intention and causation to movements. Newtson’s work is of particular interest to me because it provides an empirical, objective, and observational means of coming to terms with the process of perceiving actions. In his basic paradigm, informants watch an action sequence and indicate where they perceive actions as having occurred. Specifically, they are asked to push a button every time that, in their opinion, an action occurs in the film or video-tape they watch. The researcher records the times of the button presses for later examination with regard to the activities in the film. Newtson’s "point of action definition" provides the impetus for my study of the artificial perception of actions. The premise in using a psychological basis is that people perceive actions in a fashion desired of the machine. Points of action definition provide more than simply an idea about how others perceive actions. Most important, they provide timely clues about an ongoing process of perceiving. The alternatives, verbal reports and subjective analysis, are either not timely or just immensely difficult. A point of action definition Artificial Perception of Actions 4 is a simple and direct report from a person about when the action is seen. It turns out that the explicit point of action definition associated with verbal report and the observed physical event reveals critical relationships between the percept and the physical event. An understanding of these temporal and substantive relations is the basis for a computational understanding. Newtson’s conclusions about action perception were acquired in over ten years of research. I cannot summarize all of that research here. But one particularly relevant result is that people normally segment events into actions, even when they are not required to do so by instruction. However, people are remarkably unaware of their segmentations: A commonplace example is in film montage effects: cutting from shot to shot without confusing or making the viewer aware of the cuts. Film editors know that cuts must be made at points of action definition. A final relevant conclusion from the research findings, is that action perception is susceptible to strong forces of expectation: viewers must be prepared to see an action in order for them to see it. The nature of this preparation is not well understood, but it is clear that a preparation or expectation system of significant conceptual impact operates in the normal perception of actions. "Actions" in Visual Perception This section focuses on the general problem of deriving knowledge representations of perceived actions. It begins with a conventional logical and linguistic (logico-linguistic) derivation and then develops a perceptual derivation of knowledge representations to take account of bottom up cues from the visual stream. Top Down There are many schemes for taxonomies of conceptual actions which attempt to simplify the representation of actions for conceptual understanding and inferencing purposes. While most all existing schemes pay little attention to how actions might be recognized perceptually, many carefully attend to the conceptual representation of actions. Schank’s (1975) work with primitives in narrative prose is a familiar example and Miller and Johnson-Laird (1976) provides another. The work by Miller and Johnson-Laird is compelling because, although their technique was based on logico-linguistic analysis, their aim was to be 1 accountable to visual perception as well as language perception. To see how their logico-linguistic analysis works, and how it might work for visual perception, we can examine their analysis of some primitive actions. The most primitive action in my system is loosely verbalized as "travel." This action was tentatively assigned the definition provided formally in (1). The notation has three parts: the proposition, 2 TRAVEL(x), its presuppositions (such as the presupposition that there is a place y), and its entailments 1Miller and Johnson-Laird (1976) provide an informal computational framework, but that computational framework will not be discussed in this paper. Their formal analysis of natural language and conceptual structure is, nevertheless, a classic contribution to our knowledge for its methodology, its depth, and its scope. 2"According to Strawson (1952) , statement S semantically presupposes statement S’ if S’ is a necessary condition for the truth or falsity of S." Page 172 in Miller and Johnson-Laird (1976), suggests a willingness to accept a pragmatic or interpersonal version of this definition which depends on belief states. Artificial Perception of Actions 5 3(as in (i)). The time indicator t is conventional. Miller and Johnson-Laird’s use of the statement forming operator R (Rescher and Urquhart, 1971) is possibly novel: R corresponds to "this statement is t recognized as true at time t". Another statement forming operator, Q , is used later and it means that "this t statement is recognized as true at all moments prior to t". (1) TRAVEL(x): Something x "travels" if there is a time t and a place y such that: (i) Either: R (notAT(x,y)) & R (AT(x,y)) t-i t Or: R (AT(x,y)) & R (notAT(x,y)) t-i t The formulation in (1) pertains to conceptual representation, but I believe it superior to other conceptual formulations, such as Schank’s PTRANS, in that it stipulates logical presuppositions and entailments for primitive actions. I will later evaluate a method of action recognition that seeks to match such semantic "necessities" against visual information. The proposition, TRAVEL(x), nevertheless, lacks the place descriptor y, suggesting its natural language or conceptual function. In reasoning and understanding, the particular place from which or to which the object moves is often not relevant. Miller and Johnson-Laird (1976) state through presupposition and entailment how one may elicit the perception of an action. The statement forming operator may tell us when the conceptualization is recognized. If the conceptualization is an action conceptualization recognized at a single point in time, then we have a natural way of predicting a point of action definition. But, in fact, statement forming operators account for verb tense and aspect in Miller and Johnson-Laird (1976) and may only accidently predict points of action definition. This first version of TRAVEL has problems with the "point of action definition" property. The statement forming constraints specify two points of recognition, R and R . Furthermore, the entailment t t-i is ambiguous in point of definition: the point of definition is either the moment when the object moves from a place or the moment when the object arrives at a place. For logico-linguistic reasons, Miller and Johnson-Laird were not satisfied with this first definition either and went on to develop their final formulation in (1’). They wanted to derive that "if x does not travel, it stays where it was." They also felt TRAVEL should have a durative entailment. They did not want to allow, as (1) seems to imply, that TRAVEL is defined when objects just disappear or appear at locations. (1’) TRAVEL(x): Something x "travels" from time t0 to time t if, for each t such that m i t <= t <= t , there is a place y such that 0 i m i R (AT(x,y ) and: t i i (i) R (notAT(x,y )) t i i+1 3At first sight, entailments are simple conditionals: a statement S entails statement S’ when S is a condition for S’. When S is true, S’ cannot be false, but when S is false, S’ can be true or false. However, in Miller and Johnson-Laird’s usage, the negation of S necessitates the negation of S’ (see page 532). By example, if it is true that "x travels" then (1)(i) holds, if it is true that "x does not travel" then the negation of (i) must hold as well. If either antecedent, "it is true that ..", had been false, then the consequent could have been true or false. This is a proper usage of entailment, and I will adopt it in this paper. Artificial Perception of Actions 6 TRAVEL represents a simple motion which is, conceptually, the simplest form of action. But this is (1’) still ambiguous in point of action definition because, by continuously changing the focal place, y with y , it i now describes a primitive motion (with no fixed beginning or ending) and thereby no point of definition or, perhaps, an entirely filled sequence of them. Later I will show a way to resolve this problem and derive the primitive action from the primitive motion described in TRAVEL . (1’) The sense of primitive action is that actions of greater complexity are built of Boolean predications which directly or derivatively employ the primitive predicate. The predicate MOVE from the English transitive verb "move" is the primitive TRAVEL made complex by a perception of agentive CAUSE. The verb "move" also has an intransitive usage, as in "he moves", which is not addressed here. Rather, we have: (2) MOVE((x),y): Something x "moves" something y if: (i) TRAVEL(y) (ii) DO(x,S) (iii) CAUSE(S,(i)) The DO is, in Miller and Johnson-Laird (1976) and the views of many others, a placeholder for other movements, This placeholding is not without presumed consequence. It is a further manifestation of action "hierarchies," in this case a form of one action (the TRAVEL) subsuming whatever action(s) composed the DO. The relationship between the DO and the TRAVEL is potentially more complex than a CAUSE (Michotte, 1963 can be consulted for an appropriate perceptual analysis of causation). If the action is perceived as intentional, as this sort usually is, then the subsumptive relation is an IN-ORDER-TO relation: the actor did something in order to move the object. The IN-ORDER-TO relation has been worked out by analysis similar to Miller and Johnson-Laird (1976) in Schmidt and Sridharan and Goodson (1978). The added entailments in (2) also add complexity to the question of resolving points of action definition. Causation, for example, is perceived at a moment in time and could predict a point of action definition. However, other complex actions establish points of action definition in explicit ways and without causation. Consider the presuppositions and entailments for "reach" and "depart" in (3) and (4). The actions are momentary, at t, and each action presupposes a past, durative, activity as indicated by the statement forming operator Q . t (3) REACH(x,w): Something x "reaches" some place w if there is a moment t such that Q ((TOWARD(TRAVEL))(x,w)) and: t (i) R (AT(x,w)) t (4) DEPART(x,w): Something x "departs" some place w if there is a moment t such that Q (notTRAVEL(x)) and: t (i) R (TRAVEL(x)) t With these formulations, the referential ambiguity in TRAVEL is reduced by additional statement forming and presuppositional constraint. The point of action definition is suggested as the moment, t, at which the actor reaches a location or begins to travel. The illustrations with the "move," "depart," and "arrive" demonstrate a taxonomic system for action we Artificial Perception of Actions 7 would like to have available to processes of action perception. However, there is a failure in the logicolinguistic analysis to support such a scheme for perception. We must look elsewhere for that analysis. Bottom Up Another method for approaching action definition carries us further from natural language considerations and closer to deriving conceptualizations from physical events. This previously unpublished definition was formulated by a colleague, John Rotondo, working with Newtson to develop mathematical (not computational) models. To set the stage for this method we make a simplifying assumption that it is possible to describe the effective input into the perceptual system as a sequence of 4 states, S; where a state, s , is an interpreted image at a moment, t, in time. t The generative method recognizes two classes of change in a succession of states, S at s : a simple t 1 state difference or first-order change, c , and a change in a change or second-order (derivative) change, t 2 c . The number of moments contributing to these changes is not specified: successive state differences t can compute a constant curving movement, a uniform acceleration, or constant cycle, as well as straight movements and constant velocities. A change in a change can just as well be a change in direction as a change from movement to no movement. A generative description of the domain of possible descriptions of an event discretely continuous over time is obtained by evaluating all change descriptors for all moments in time. This generative description, it is claimed, generates the superset of all actions in an activity, as well as all motions. It takes little imagination to recognize that the number of generated actions and motions rapidly grows very large. Even with dramatic simplification, such as differencing between just two successive states, real world imagery produces enormous numbers of change descriptions over short periods of time. We can, however, now view action perception as problem solving and formulate action schema in terms of a search method through a space of descriptions. Three reductions are used to limit the search: 2 • Points of action definition are drawn from the collection of second-order changes, C . This is called the action postulate. • A systemization of schemas which recognize actions is provided. This is similar in spirit and form to Miller and Johnson-Laird (1976) . • A system for dynamically elaborating schema and making selected schema available to perceptual processes is provided called the expectation mechanism. The action postulate. The action postulate is advantageous when it is important to monitor changes, such as a continuous movement toward a place, over some period of time, just as one monitors states, such as an object at rest. In describing the perceptual activity, as opposed to the conceptual activity, monitoring environmental invariances becomes important (Gibson, 1966). 4Two, two and a half, or three dimensional segmented, and perhaps conceptually labeled, representation of the visually encompassed world for a moment in time. Artificial Perception of Actions 8 5 The action postulate was derived empirically. used the term "feature of change" rather than "secondorder change" in his description of the postulate.). Newtson’s studies (some of which are not published) have suggested that the action postulate, while correct in one sense, is only approximately precise. By example: In a film of a man hammering a nail into a block of wood, people routinely report the action slightly after the second-order change has occurred. People differ in selectivity: some will report actions only when the man hits the nail while others will see the man raising his hand as an action as well. Nevertheless, they are consistent in reporting the action several instants after the hammer stops on the nail or after the hand is at its full altitude and just begins to fall. My own observations have confirmed such a delay. The only work with the delay has used film speed variations to confirm that the delay is not simply reaction time delay: As film speed is increased, the delay is shortened, and, as it slows, the delay lengthens. Unfortunately no systematic investigation of this delay for different classes of action has been undertaken to date. In the next section, we develop a system to represent the action postulate explicitly and account in general terms, for the observed imprecision in the postulate. The system derives particular action schema from a parent action schema in a fashion analogous to the systemization in Miller and JohnsonLaird. Action schemata for perception. The point in time where an action occurs provides concrete references for the undertaking in artificial perception, but as a practical computational matter, it is good to know how the machine is enabled to see actions defined at particular moments in time, from bottom-up 1 cues. The cues I currently allow are the state descriptions, S, and first-order change descriptions, C , 6 available to perception. The major part of the computational study was in how to define action schemata which would map these bottom up cues to action conceptionalizations. I initiated the computational study without any presumption of a single primitive schema for actions. In fact, I set out to develop any number of elaborate finite state machines consistent with Newtson’s observations that could be employed to recognize different actions. These state machines were found to be largely compositions of a simpler machine which I now think of as a primitive or parent action schema. This parent action schema and its systemization is the topic of this section. The parent schema is defined as an automaton, thereby defining the schema in procedural terms, in the same spirit as Piaget’s, (1963) formulations . The systemization asserts that every action perceived is a manifestation of a modified instance of the parent action schema. In that manifestation, this instance has been evaluated against the sensory stream and has thereby successfully completed the instantiation of its associated action conceptualization. The parent action schema has five parts which includes the declarative conceptualization which the automaton functions to instantiate and assert, two special conditions it monitors in the sensory stream, a procedural method for robustness against noise, and a procedural subsumption method for forming compositions of schemas which would signify a single conceptualization. These components, their modification and evaluation, are described in detail below: 5Newtson (1976 6For this paper, I set aside discussion of recursively allowing second-order change descriptions to influence the computation of new second-order change descriptions, as in "The man repeatedly went to get coffee". Artificial Perception of Actions 9 • A Conceptualization,CON, to stand for the action at the moment it is conceptualized. • A Criterial Component,COLLECT, which is a Boolean composition of simple state and firstorder change descriptions to be detected in the sensory stream. This component is not the only referential component criterial to an action, but it is the only referential component criterial to the point of definition of the action. All actions will be defined at the point in time the criterial component goes from true to false by reference. For example, a person breaking a glass might include monitoring the fall of the glass toward the floor. When the glass ceases falling, or perhaps moving (within a limited pattern), the action is defined. • A Non-Criterial Component,ACHIEVE, which is a Boolean composition of state and first-order change descriptions to be detected in the sensory stream. This non-criterial component is consulted before and at the moment of action definition to determine whether the action itself is true or false (whether the glass broke or bounced). While this component is referentially criterial (it determines the truth of a proposition on referential or deictic grounds), it is not criterial in identifying the point of definition of the action. • A Perceptual Error Correction Component which determines the tolerance in evaluating the change in truth value of the Criterial Component. This component represents a system of processes which modulate perceptual decisions: For example, it would guarantee that a brief stop in movement (owing to any error or weakness of man or machine) would be 7 routinely ignored and not signal a false point of action definition. • A Motion Linking Component, NEXT, which permits chaining action conceptualizations (viz., CON pointers) for "macro" definitions. This is an optional component. The presence of a NEXT predication can suspend a true-false decision through the ACHIEVE Component, and thereby it can suspend a point of action definition. An example of the use of the NEXT component can be found in one of two ways of viewing a man hammering a nail (see Newtson, Rindner, Miller and LaCross, 1978): (a) simple, one schema, actions such as the hammer hitting the nail or the hand rising, and (b) a sequence of the hand rising then the hammer coming down and hitting the nail. The sequenced perception may be of use in stages of becoming skilled 7The specific error correction system generally depends on the details of the implementation. In the implementation described later, the system associated two parameters with each action schema, (a) a minimum duration threshold for the COLLECT proposition to turn true (after verification begins) and (b) a maximum duration threshold over which the COLLECT proposition was not true before deciding the COLLECT proposition was in fact not true. Artificial Perception of Actions 10 at perceiving. If the hand rising has no other effect than to allow it to fall, the action schema may be shortened to contain only the last part of the hammer hitting the nail. A further reason for the NEXT is that the processes which adjust what actions are "seen" will no longer posit "the hand rising" as a relevant thing to monitor once the NEXT relationship is recognized between the rising and the hitting. Instances of action schema can be derived for Miller and Johnson-Laird’s (1976) examples. I will use a one state proposition, At(x,y), which signals that an object, x, is at a place y, and one first-order change proposition, Move(x) which signals that an object has changed its position in space. To recognize a TRAVEL requires at least two primitive instantiations of the parent schema to conform with the (1) ambiguity in its entailment. The instantiations are given in (5) and (6). The new representation fills designated slots in the schema described above. (5) CON: R (TRAVEL (x)) t (1) COLLECT: At(x,y) ACHIEVE: notAt(x,y) (6) CON: R (TRAVEL (x)) t (1) COLLECT: notAt( x,y) ACHIEVE: At(x,y) The action postulate says that the point in time, t, that the COLLECT for (5) or (6) goes from true to false, there is a determination of the truth or falsity of the conceptualization, CON. In these cases, the conceptualizations, if ever recognized, will be true, since the COLLECT is the inverse of the ACHIEVE. A more serious problem with this formulation is that it violates the action postulate since it only detects a first-order change (change between two states), not a second-order change (a change in a change). So, like Miller and Johnson-Laird (1967) we are led to reject this formulation as unsound. TRAVEL(1) conflicts with the action postulate. Two different schemas are required to recognize the durative form of TRAVEL . It is not possible to (1’) report the ongoing change at an arbitrary moment: we must select the beginning of the travel as in (7) or the end of the travel, as in (8). (7) CON: R (TRAVEL (x)) t (1’) COLLECT: notMove(x) ACHIEVE: Move(x) (8) CON: R (TRAVEL (x)) t (1’) COLLECT: Move(x) ACHIEVE: Move(x) These schemas collect the absence of a change of position or the change of position. When either of these conditions change (go from true to false), the system reports the object, x, has traveled. The apparently unambiguous TRAVEL has two forms, like TRAVEL , but because of the imposition of a (1’) (1) natural point of definition on the action of traveling. These, nevertheless, represent primitive conceptual actions by perceptual derivation. TRAVEL is virtually the same in semantic form to DEPART in (4). (7) However TRAVEL corresponds to a past tense version of primitive movement TRAVEL . (8) (1’) The current system provides for the delay between the physical moment, t-i, when the COLLECT Artificial Perception of Actions 11 proposition goes from true to false, and the response moment t, when the point of definition occurs, in the simplest fashion possible. Because of the effect of noise, the perceptual error component allows some absorption in deciding whether the COLLECT proposition has gone from True to False. I assume that the same absorptive process that keeps the COLLECT proposition True in noise also accounts for the observed delay (perhaps as much as several seconds) in reporting that an action has occurred. Since the COLLECT monitoring rate is defined by reference to the environmental frame rate, not in absolute time, this explanation is also in agreement with the preliminary experimental findings when film projection rates are speeded or slowed. Defining action schemas for recognition is clearly different from defining them for conceptual analysis, but the taxonomic principles of schema organization are potentially similar. Instantiations and 8 compositions of the instantiations of the parent schema provide the basis for taxonomic categorization of action schema. Since conceptualizations are a product of schema recognition, the taxonomies can naturally reflect the taxonomies of conceptualizations. But this sort of taxonomic constraint is definitional and not very interesting. It would be interesting if we could also carry down the conceptual constraints (such as the "toward" modification on TRAVEL which contributes to REACH in (3)) to the COLLECT and ACHIEVE levels of appropriate action schemas. Formally, COLLECT carries presuppositions and ACHIEVE carries entailments: a True COLLECT proposition is required even to entertain the truth or falsity of the CON. But given that COLLECT is True, a True or False ACHIEVE proposition determines whether the CON is True or False. This is an important connection between the linguistic and perceptual analysis. The match also depends upon the first-order change and state descriptors that result from bottom up processing. As a research tactic, these descriptors can be adjusted to guarantee compatibility. The most dominant factors in carrying out conceptual constraints are, then, the expectation mechanism and the presence of the required information in the physical events. These are the next topics. Expectation Mechanism. The model of action perception requires the generation of pools of action schemas to act as independent finite state automata that monitor the state and first-order change information on a moment by moment basis. Action schemas monitor derivative information, not the physical events directly, in as much as their function is to respond to second-order changes. The relationship between action schemas and state and first-order change information is fundamentally a matching relationship where current action schemas are repeatedly matched until the COLLECT proposition in any one goes from true to false and an action is defined. The expectation mechanism has the role of identifying the current subset of action schema which are, for one reason or another, worthy candidates for such matching. The expectation mechanism may both generate and select action schemas. There are reasons to limit the set of current action schemas to a subset of action schemata. • To do a physical implementation which operates in something close to real time, it is of course necessary to limit the set size. 8Using the NEXT mechanism. Artificial Perception of Actions 12 • Action schemas are all instantiations and compositions of instantiations of a most general action schema. There is no requirement for complete instantiations. For example, there might be a general "open a door" schema that matches anyone opening a door. If two schemas are available which differ only in specificity, then only the least specific schema requires matching. I know of one empirical observation that supports the availability of incompletely instantiated action schemata: If something unexpected occurs in an activity, the temporal density of points of action definition increases abruptly (Newtson, 1973; Newtson and Rindner, 1979). This effect has been called a primacy effect; it is more like an reorientation effect. It suggests that people initially use more abstract action schema to pick up on the "micro-actions" in order to make initial sense of what is going on. • There are logical constraints on schema relevance: an "open a door" schema need only be monitored when a door is in the picture. There needs to be a mechanism which obviates the need for a ’close the door’ schema to be monitored when the door is already closed. • In addition to the relevance of partial matching and logical constraints, perceptual confusions are a functional reason for selectivity. Preliminary empirical work suggests this functional basis may be a most significant one. Over the long term, a variety of context mechanisms are important to reduce the complexity of action schema. The fact that a COLLECT proposition represents any arbitrarily complex Boolean predication of state and first-order change descriptors does not assure that any arbitrarily complex Boolean predication of such descriptors is in fact sufficient to distinguish one action from another. Perception is selective in profound ways: the same activity and the same physical manifestation may have different significance on different occasions. It is well recognized in Epistemology that one can take virtually any action one can conceive of and make up an example: "closing the door" may alternatively be seen as "pushing the door as far as it will go" (with exactly the same schema except for the CON designation). The difference may be in some higher intention, in the example, containing something versus getting the door out of the way. The NEXT component was designed specifically for this reason. In the hammering example, raising the hammer then (NEXT) bringing it down seen as a single action by a perceptual sequence may, with further accommodation, be seen as hitting the table with the hammer. Also recall the DO in linguistic semantics: Raising the hammer and lowering it is what the actor did in order to hit the table. This problem has been extensively addressed from an AI perspective by Schmidt Artificial Perception of Actions 13 and Sridharan (Schmidt, Sridharan and Goodson, 1978; Sridharan and Schmidt, 1978). I have, in effect, adopted their resolution which posits the use of an expectation mechanism. A reasonable framework for talking about an expectation mechanism is in terms of a generalized rule system, or production system (Thibadeau and Just, 1982). The function of the rules in the production system is to adjust the expectation set. The only adjustment which can be made is the addition or subtraction of action schemas to and from the expectation set. Within this framework, I propose a new postulate, the expectation postulate: the expectation set is adjusted only at points of action definition. The rules in the production system add or delete action schemas from the expectation set in response to actions. One of the reasons for identifying a production system as a useful preliminary model for an expectation generator is that it is structured to respond quickly to new demands (Erman and Lesser, 1978; Tanimoto, 1982). The information available to the rules in the production system can be considerably richer than simply the most current action recognized and the set of all action schemas. Furthermore, the rules can posit intermediate results and can apply with intermediate effects other than simply the addition and subtraction of action schemas for the action recognition matcher. A potential practical problem with the expectation postulate is that artificial perception could be effectively blind to unexpected actions at least for a significant period of time. To reduce blindness, a proper role of the action recognition matcher would be to carry an implicit action schema that can signal significant activity not being accounted for by any action schema. This could be given by an action such as TRAVEL (x) signaled at the beginning of a movement with a free variable for x. (7) From one perspective, the action TRAVEL (x), once recognized, is treated like any other recognized (7) action: the rule system provides additions and subtractions of action schema in response to the action. It is likely that having the "wired in" accounting mechanism alone is more desirable than a "wired in" TRAVEL (x) schema. An explicit loading of situationally sensitive TRAVEL -type schema is (7) (7) preferable. Such schemas do not respond unless the second-order change they detect is not accounted for in any other action schema (they always compute off residual state and first-order change descriptions).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparative study of Factors influencing moral actions in Iran and Blasi’s viewpoint

One of the significant dimensions of human’s social growth is his moral development. The human being not only is a rational and intellectual being, but also is a moral being. His morality has its origin in his sociability. The human being not only is self-centered and concerned about his own interests and well-being, but also is concerned about others’ interests and their judgment of his action...

متن کامل

A Cognitive System for Understanding Human Manipulation Actions

This paper describes the architecture of a cognitive system that interprets human manipulation actions from perceptual information (image and depth data) and that includes interacting modules for perception and reasoning. Our work contributes to two core problems at the heart of action understanding: (a) the grounding of relevant information about actions in perception (the perception-action in...

متن کامل

Data Mining with Decision Trees - Theory and Applications

By reading, you can know the knowledge and things more, not only about what you get from people to people. Book will be more trusted. As this data mining with decision trees theory and applications 2nd edition series in machine perception and artifical intelligence series in machine perception and artificial intelligence, it will really give you the good idea to be successful. It is not only fo...

متن کامل

Integration of Audiovisually Compatible and Incompatible Consonants In Identification Experiments

Although studies of speech perception often focus on the interpretation of the acoustic speech waveform, in many situations the face of the talker can be seen by the listener and the perception of facial actions can greatly influence the interpretation of this signal. When the acoustic signal is unclear, observed facial actions can improve intelligibility [14]. On the other hand, even when the ...

متن کامل

Planning in Semantic Network Using Node Nested Grammars

Detailed design assumptions and functional principles of the planning processes in a specific semantic network are presented in this paper. The unique hierarchic network structure, interconnected by relations on objects and categories, and equipped with node-nested grammars makes the planning a fast and effective process, similar to human’s approach to the problem. Grammar productions and their...

متن کامل

Cognitive Technical Systems - What Is the Role of Artificial Intelligence?

The newly established cluster of excellence COTESYS 1 investigates the realization of cognitive capabilities such as perception, learning, reasoning, planning, and execution for technical systems including humanoid robots, flexible manufacturing systems, and autonomous vehicles. In this paper we describe cognitive technical systems using a sensor-equipped kitchen with a robotic assistant as an ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Cognitive Science

دوره 10 شماره

صفحات -

تاریخ انتشار 1986

Artificial Perception of Actions

نویسنده

چکیده

منابع مشابه

A comparative study of Factors influencing moral actions in Iran and Blasi’s viewpoint

A Cognitive System for Understanding Human Manipulation Actions

Data Mining with Decision Trees - Theory and Applications

Integration of Audiovisually Compatible and Incompatible Consonants In Identification Experiments

Planning in Semantic Network Using Node Nested Grammars

Cognitive Technical Systems - What Is the Role of Artificial Intelligence?

عنوان ژورنال:

اشتراک گذاری